Attribution: Javier Luraschi’s talk slides from SDSS 2019
source: https://hadoop4usa.wordpress.com/2012/04/13/scale-out-up/
MapReduce (Hadoop) was the original big kid on the block in terms of scaling out.
source: https://data-flair.training/blogs/spark-vs-hadoop-mapreduce/
Spark’s increases in speed and ease of use means there is now a faster and smoother kid on the block…
source: Zaharia et al. (2016). Apache Spark: A Unified Engine For Big Data Processing
source: https://www.slideshare.net/SparkSummit/trends-for-big-data-and-apache-spark-in-2017-by-matei-zaharia
sparklyr + Databricks demoNotebook 1 - Install sparklyr on Databricks cluster
Notebook 2 - Analysis demo
sparklyr Gitter!Attribution: Javier Luraschi’s talk slides from SDSS 2019